Extracting and Using Attribute-Value Pairs from Product Descriptions on the Web
نویسندگان
چکیده
We describe an approach to extract attribute-value pairs from product descriptions in order to augment product databases by representing each product as a set of attribute-value pairs. Such a representation is useful for a variety of tasks where treating a product as a set of attribute-value pairs is more useful than as an atomic entity. We formulate the extraction task as a classification problem and use Naı̈ve Bayes combined with a multi-view semi-supervised algorithm (co-EM). The extraction system requires very little initial user supervision: using unlabeled data, we automatically extract an initial seed list that serves as training data for the semi-supervised classification algorithm. The extracted attributes and values are then linked to form pairs using dependency information and co-location scores. We present promising results on product descriptions in two categories of sporting goods products. The extracted attribute-value pairs can be useful in a variety of applications, including product recommendations, product comparisons, and demand forecasting. In this paper, we describe one practical application of the extracted attribute-value pairs: a prototype of an Assortment Comparison Tool that allows retailers to compare their product assortments to those of their competitors. As the comparison is based on attributes and values, we can draw meaningful conclusions at a very fine-grained level. We present the details and research issues of such a tool, as well as the current state of our
منابع مشابه
Extracting A ribute-Value Pairs from Product Specifications on the Web
Comparison shopping portals integrate product o ers from large numbers of e-shops in order to support consumers in their buying decisions. Product o ers often consist of a title and a free-text product description, both describing product attributes that are considered relevant by the speci c vendor. In addition, product o ers might contain structured or semi-structured product speci cations in...
متن کاملSemi-Supervised Learning to Extract Attribute-Value Pairs from Product Descriptions on the Web
We describe an approach to extract attribute-value pairs from product descriptions on the Web. The goal is to augment product databases by representing each product as a set of such attribute-value pairs. Such a representation is useful for a variety of tasks where treating the product as a set of attribute-value pairs is more useful than as an atomic entity. Examples include product recommenda...
متن کاملIRIS: A Protégé Plug-in to Extract and Serialize Product Attribute Name-Value Pairs
This article introduces IRIS wrapper, which is developed as a Protégé plug-in, to solve an increasingly important problem: extracting information from the product descriptions provided by online sources and structuring this information so that is sharable among business entities, software agents and search engines. Extracted product information is presented in a GoodRelations-compliant ontology...
متن کاملSemi-Supervised Learning of Attribute-Value Pairs from Product Descriptions
We describe an approach to extract attribute-value pairs from product descriptions. This allows us to represent products as sets of such attribute-value pairs to augment product databases. Such a representation is useful for a variety of tasks where treating a product as a set of attribute-value pairs is more useful than as an atomic entity. Examples of such applications include product recomme...
متن کاملUsing Text Reviews for Product Entity Completion
In this paper we address the problem of obtaining structured information about products in the form of attribute-value pairs by leveraging a combination of enterprise internal product descriptions and external data. Product descriptions are short text strings used internally within enterprises to describe a product. These strings usually comprise of the Brand name, name of the product, and its ...
متن کامل